A statistical approach to crosslingual natural language tasks
نویسندگان
چکیده
The existence of huge volumes of documents written in multiple languages in Internet lead to investigate novel approaches to deal with information of this kind. We propose to use a statistical approach in order to tackle the problem of dealing with crosslingual natural language tasks. In particular, we apply the IBM alignment model 1 with the aim of obtaining a statistical bilingual dictionary which may further be used in order to approximate the relatedness probability of two given documents (written in different languages). The experimental results sucessfully obtained in three different tasks –text classification, information retrieval and plagiarism analysis– highlight the benefit of using the presented statistical approach.
منابع مشابه
ITRI-03-13 CROCODIAL: Crosslingual Computer-mediated Dialogue
We describe a novel approach to crosslingual dialogue which allows for highly accurate communication of semantically complex content. The approach is introduced through an application in a B2B scenario. We are currently building a browser-based prototype for this scenario. The core technology underlying the approach is natural language generation. We also discuss how the proposed approach can c...
متن کاملCROCODIAL: Crosslingual Computer-mediated Dialogue
We describe a novel approach to crosslingual dialogue which allows for highly accurate communication of semantically complex content. The approach is introduced through an application in a B2B scenario. We are currently building a browser-based prototype for this scenario. The core technology underlying the approach is natural language generation. We also discuss how the proposed approach can c...
متن کاملInducing Crosslingual Distributed Representations of Words
Distributed representations of words have proven extremely useful in numerous natural language processing tasks. Their appeal is that they can help alleviate data sparsity problems common to supervised learning. Methods for inducing these representations require only unlabeled language data, which are plentiful for many natural languages. In this work, we induce distributed representations for ...
متن کاملCrosslingual Distributed Representations of Words
Distributed representations of words have proven extremely useful in numerous natural language processing tasks. Their appeal is that they can help alleviate data sparsity problems common to supervised learning. Methods for inducing these representations require only unlabeled language data, which are plentiful for many natural languages. In this work, we induce distributed representations for ...
متن کاملBorrowing Language Resources for Development of Automatic Speech Recognition for Low- and Middle-Density Languages
In this paper we describe an approach that both creates crosslingual acoustic monophone model sets for speech recognition tasks and objectively predicts their performance without target-language speech data or acoustic measurement techniques. This strategy is based on a series of linguistic metrics characterizing the articulatory phonetic and phonological distances of target-language phonemes f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Algorithms
دوره 64 شماره
صفحات -
تاریخ انتشار 2008